- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0004000000000000
- More
- Availability
-
40
- Author / Contributor
- Filter by Author / Creator
-
-
Cutkosky, Ashok (2)
-
Orabona, Francesco (2)
-
Aybat, Necdet Serhat (1)
-
Boloni, Ladislau (1)
-
Fallah, Alireza (1)
-
Gurbuzbalaban, Mert (1)
-
Jun, Kwang-Sung (1)
-
Khodadadeh, Siavash (1)
-
Ozdaglar, Asuman (1)
-
Shah, Mubarak (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
- Filter by Editor
-
-
Beygelzimer, A. (3)
-
Garnett, R. (3)
-
Larochelle, H. (3)
-
Wallach, H. (3)
-
Fox, E. (2)
-
d'Alché-Buc, F. (2)
-
Beygelzimer, A (1)
-
Fox, E (1)
-
Garnett, R (1)
-
Larochelle, H (1)
-
Wallach, H (1)
-
d' Alché-Buc, F (1)
-
null (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Wallach, H.; Larochelle, H.; Beygelzimer, A.; Fox, E.; Garnett, R. (Ed.)
-
Jun, Kwang-Sung; Cutkosky, Ashok; Orabona, Francesco (, Advances in neural information processing systems)Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; null; Garnett, R. (Ed.)In this paper, we consider the nonparametric least square regression in a Reproducing Kernel Hilbert Space (RKHS). We propose a new randomized algorithm that has optimal generalization error bounds with respect to the square loss, closing a long-standing gap between upper and lower bounds. Moreover, we show that our algorithm has faster finite-time and asymptotic rates on problems where the Bayes risk with respect to the square loss is small. We state our results using standard tools from the theory of least square regression in RKHSs, namely, the decay of the eigenvalues of the associated integral operator and the complexity of the optimal predictor measured through the integral operator.more » « less
-
Cutkosky, Ashok; Orabona, Francesco (, Advances in neural information processing systems)Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; Garnett, R. (Ed.)Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results. We present a new algorithm, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning. Our technique for removing the batches uses a variant of momentum to achieve variance reduction in non-convex optimization. On smooth losses $$F$$, STORM finds a point $$\boldsymbol{x}$$ with $$E[\|\nabla F(\boldsymbol{x})\|]\le O(1/\sqrt{T}+\sigma^{1/3}/T^{1/3})$$ in $$T$$ iterations with $$\sigma^2$$ variance in the gradients, matching the optimal rate and without requiring knowledge of $$\sigma$$.more » « less
-
Khodadadeh, Siavash; Boloni, Ladislau; Shah, Mubarak (, Advances in neural information processing systems)Wallach, H; Larochelle, H; Beygelzimer, A; d' Alché-Buc, F; Fox, E; Garnett, R (Ed.)
An official website of the United States government

Full Text Available